{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Find the Missing Element\n",
    "\n",
    "## Problem\n",
    "\n",
    "Consider an array of non-negative integers. A second array is formed by shuffling the elements of the first array and deleting a random element. Given these two arrays, find which element is missing in the second array. \n",
    "\n",
    "Here is an example input, the first array is shuffled and the number 5 is removed to construct the second array.\n",
    "\n",
    "Input:\n",
    "    \n",
    "    finder([1,2,3,4,5,6,7],[3,7,2,1,4,6])\n",
    "\n",
    "Output:\n",
    "\n",
    "    5 is the missing number\n",
    "\n",
    "## Solution\n",
    "\n",
    "The naive solution is go through every element in the second array and check whether it appears in the first array. Note that there may be duplicate elements in the arrays so we should pay special attention to it. The complexity of this approach is O(N^2), since we would need two for loops.\n",
    "\n",
    "A more efficient solution is to sort the first array, so while checking whether an element in the first array appears in the second, we can do binary search (we'll learn about binary search in more detail in a future section). But we should still be careful about duplicate elements. The complexity is O(NlogN). \n",
    "\n",
    "If we don’t want to deal with the special case of duplicate numbers, we can sort both arrays and iterate over them simultaneously. Once two iterators have different values we can stop. The value of the first iterator is the missing element. This solution is also O(NlogN). Here is the solution for this approach:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def finder(arr1,arr2):\n",
    "    \n",
    "    # Sort the arrays\n",
    "    arr1.sort()\n",
    "    arr2.sort()\n",
    "    \n",
    "    # Compare elements in the sorted arrays\n",
    "    for num1, num2 in zip(arr1,arr2):\n",
    "        if num1!= num2:\n",
    "            return num1\n",
    "    \n",
    "    # Otherwise return last element\n",
    "    return arr1[-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr1 = [1,2,3,4,5,6,7]\n",
    "arr2 = [3,7,2,1,4,6]\n",
    "finder(arr1,arr2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In most interviews, you would be expected to come up with a linear time solution. We can use a hashtable and store the number of times each element appears in the second array. Then for each element in the first array we decrement its counter. Once hit an element with zero count that’s the missing element. Here is this solution: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "import collections\n",
    "\n",
    "def finder2(arr1, arr2): \n",
    "    \n",
    "    # Using default dict to avoid key errors\n",
    "    d=collections.defaultdict(int) \n",
    "    \n",
    "    # Add a count for every instance in Array 1\n",
    "    for num in arr2:\n",
    "        d[num]+=1 \n",
    "    \n",
    "    # Check if num not in dictionary\n",
    "    for num in arr1: \n",
    "        if d[num]==0: \n",
    "            return num \n",
    "        \n",
    "        # Otherwise, subtract a count\n",
    "        else: d[num]-=1 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "arr1 = [5,5,7,7]\n",
    "arr2 = [5,7,7]\n",
    "\n",
    "finder2(arr1,arr2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One possible solution is computing the sum of all the numbers in arr1 and arr2, and subtracting arr2’s sum from array1’s sum. The difference is the missing number in arr2. However, this approach could be problematic if the arrays are too long, or the numbers are very large. Then overflow will occur while summing up the numbers.\n",
    "\n",
    "By performing a very clever trick, we can achieve linear time and constant space complexity without any problems. Here it is: initialize a variable to 0, then [XOR](https://en.wikipedia.org/wiki/Exclusive_or) every element in the first and second arrays with that variable. In the end, the value of the variable is the result, missing element in array2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def finder3(arr1, arr2): \n",
    "    result=0 \n",
    "    \n",
    "    # Perform an XOR between the numbers in the arrays\n",
    "    for num in arr1+arr2: \n",
    "        result^=num \n",
    "        print result\n",
    "        \n",
    "    return result "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5\n",
      "0\n",
      "7\n",
      "0\n",
      "5\n",
      "2\n",
      "5\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "finder3(arr1,arr2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Test Your Solution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ALL TEST CASES PASSED\n"
     ]
    }
   ],
   "source": [
    "\"\"\"\n",
    "RUN THIS CELL TO TEST YOUR SOLUTION\n",
    "\"\"\"\n",
    "from nose.tools import assert_equal\n",
    "\n",
    "class TestFinder(object):\n",
    "    \n",
    "    def test(self,sol):\n",
    "        assert_equal(sol([5,5,7,7],[5,7,7]),5)\n",
    "        assert_equal(sol([1,2,3,4,5,6,7],[3,7,2,1,4,6]),5)\n",
    "        assert_equal(sol([9,8,7,6,5,4,3,2,1],[9,8,7,5,4,3,2,1]),6)\n",
    "        print 'ALL TEST CASES PASSED'\n",
    "\n",
    "# Run test\n",
    "t = TestFinder()\n",
    "t.test(finder)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Good Job!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
